E-mail Classification by Decision Forests
نویسندگان
چکیده
We investigate the use of decision forests for automated e-mail filing into folders and junk e-mail filtering. The experiments show that decision forests offer the following advantages: (i) ability to deal with the large dimensionality of feature vectors in text categorization, (ii) improved accuracy of the ensemble over the single decision trees and favourable comparison with a number of other highly accurate classifiers including neural networks and boosted decision trees, and (iii) acceptable computational expenses.
منابع مشابه
Voting-based Classification for E-mail Spam Detection
The problem of spam e-mail has gained a tremendous amount of attention. Although entities tend to use e-mail spam filter applications to filter out received spam e-mails, marketing companies still tend to send unsolicited emails in bulk and users still receive a reasonable amount of spam e-mail despite those filtering applications. This work proposes a new method for classifying emails into spa...
متن کاملClassifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees
The random forests method is one of the most successful ensemble methods. However, random forests do not have high performance when dealing with very-high-dimensional data in presence of dependencies. In this case one can expect that there exist many combinations between the variables and unfortunately the usual random forests method does not effectively exploit this situation. We here investig...
متن کاملTHE FLORA OF THREATENED BLACK ALDER FORESTS IN THE CASPIAN LOWLANDS, NORTHERN IRAN
The Caspian (Hyrcanian) lowland forest zone in northern Iran is characterized by small remnant alder forest communities, dominated or subdominated with an Euxino-Hyrcanian element, Alnus glutinosa ssp. barbata. The first floristic inventory of these alder forests in northern Iran is presented. The floristic catalogue is based on the data of 133 phytosociological releves in eight different alder...
متن کاملPopulation variation of Artemisia sieberi in Iran based on quantitative characters of leaf and seed and their relationships with habitat features
Thirty-four populations of Artemisia sieberi from 10 provinces of Iran were investigated with respect to quantitative characteristics of leaves and seeds. In each habitat, five plants were randomly selected and some branches were harvested for studying leaf characteristics in spring and seed characteristic in autumn. Principle features of climate and soil were studied in each habitat. In order ...
متن کاملLearning to classify e-mail
In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large an...
متن کامل